Chapter 11: Hypothesis Testing with Randomization

Overview

  • 11 Hypothesis testing with randomization
    • 11.1 Sex discrimination case study
      • 11.1.1 Observed data
      • 11.1.2 Variability of the statistic
      • 11.1.3 Observed statistic vs. null statistics
    • Simulation using the applet: Two studies
    • 11.2 Opportunity cost case study
      • 11.2.1 Observed data
      • 11.2.2 Variability of the statistic
      • 11.2.3 Observed statistic vs. null statistics
    • 11.3 Hypothesis testing
      • 11.3.1 The US court system
      • 11.3.2 P-value and statistical significance

11.1 Sex discrimination case study

Research Question: Are female employees discriminated against in promotion decisions made by male managers?

  • 48 male supervisors were given a simulated personnel file
  • Response variable: should the employee be promoted?
  • Explanatory variable: sex of employee (randomly assigned)

11.1.1 Observed data

glimpse(sex_discrimination)
Rows: 48
Columns: 2
$ sex      <fct> male, male, male, male, male, male, male, male, male, male, male, male, male, mal…
$ decision <fct> promoted, promoted, promoted, promoted, promoted, promoted, promoted, promoted, p…
table(sex_discrimination)
        decision
sex      promoted not promoted
  male         21            3
  female       14           10

Group Discussion

prop.table(table(sex_discrimination), margin = 1)
        decision
sex       promoted not promoted
  male   0.8750000    0.1250000
  female 0.5833333    0.4166667
  1. Describe the apparent association between the variables sex and decision.

  2. Do you think these proportions indicate evidence of discrimination, or could they just be showing chance variation due to the random components of the study (e.g., sampling and assignment of the supervisors)? How are you deciding?

Statistics, Parameters, Hypotheses

Statistics:

  • \(\hat{p}_F \approx 0.583\) is the proportion of females that were promoted.
  • \(\hat{p}_M = 0.875\) is the proportion of males that were promoted.
  • \(\hat{p}_M - \hat{p}_F \approx 0.292\) is the difference in proportions.

Parameters:

  • \(p_F\) is the actual probability that a female will be promoted.
  • \(p_M\) is the actual probability that a male will be promoted.
  • \(p_M - p_F\) is the difference in proportions.

Hypotheses:

  • \(H_0: p_M - p_F = 0\) (Null Hypotheses: no discrimination)
  • \(H_A: p_M - p_F > 0\) (Alternative Hypothesis: discrimination against women)

11.1.2 Variability of the statistic

  • Observed Statistic: \(\hat{p}_M - \hat{p}_F \approx 0.292\)
  • \(H_0: p_M - p_F = 0\) (Null Hypotheses: no discrimination)
  • \(H_A: p_M - p_F > 0\) (Alternative Hypothesis: discrimination against women)

The observed statistic is in the direction of the alternative hypothesis, but there are two possibilities:

  • \(H_A\) is true: we have evidence of discrimination.
  • \(H_0\) is true, and we are just observing random variation.

Modeling Random Variation

summary(sex_discrimination)
     sex             decision 
 male  :24   promoted    :35  
 female:24   not promoted:13  
  • Make 35 “promoted” cards (red) and 13 “not promoted” cards (blue)
  • Randomly assign 24 cards male and 24 female.
  • Compute the difference in proportions several times.
  • Compare to the observed difference in proportions.

11.1.3 Observed statistic vs. null statistics

  • The observed statistic is 0.292.
  • We repeat the card shuffling 100 times and compute 100 null statistics.
    • The null statistic simulates what would happen if \(H_0\) were true.
  • If we rarely see null statistics as extreme as the observed statistic, we have evidence that \(H_0\) is false.

P-value

The simulation-based P-value is the proportion of simulated statistics that were as extreme or more extreme than the observed statistic.

  • We simulated 100 statistics.
  • Only 2 of them were more extreme that 0.292.
  • P-value \(= 2/100 = 0.02\)

The smaller the P-value, the more evidence you have against \(H_0\) in favor of \(H_A\).

The P-value is the probability, assuming \(H_0\), that an identical study would produce statistics as extreme as the statistics you observed.

Dolphins and Depression

Dolphin study

  • 30 subjects, randomly assigned into treatment and control.
  • Treatment: Swim in ocean near bottlenose dolphins
  • Control: Swim in ocean without dolphins
  • Response: Did their depression condition improve, or not?

Two Proportion Simulation Applet

https://www.rossmanchance.com/applets/2021/chisqshuffle/ChiSqShuffle.htm?dolphins=1

Group Activity

  1. The applet shows you a contingency table and a standardized barplot of the dolphin data. Record the observed difference in proportions \(\hat{p}_\text{dolphin} - \hat{p}_\text{control}\) between the treatment group and the control group.

  2. Check the box for Show Shuffle Options. Click the Shuffle button and observe what happens. Click it a couple more times and observe.

  3. Enter 1000 in the Number of Shuffles box and click Shuffle. Use the Count Samples box to count the number of samples that are more extreme than the observed statistic (enter the observed statistic in the box and click Count). Record the proportion that are more extreme (i.e., the P-value).

  4. Do you think the study is just showing random variation, or do you think the treatment group is showing significantly better improvement than the control group?

Penguin Study

Are flipper bands associated with penguins survival rates?

Researchers followed a random sample of 30 penguins for a period of time. Some were banded, some not. They recorded whether they were still alive at the end of the time period.

  • Explanatory variable: Banded/Not Banded
  • Response variable: Alive/Not Alive

Group Activity

  1. Return to the applet, and click on the link for Penguin study.

  2. Check the box for Show Shuffle Options. Enter 1000 in the Number of Shuffles box and click Shuffle. Use the Count Samples box to count the number of samples that are more extreme than the observed statistic (enter the observed statistic in the box and click Count). Record the proportion that are more extreme (i.e., the P-value).

  3. Do you think the study is just showing random variation, or do you think there is a significant difference in the survival rates between banded and non-banded penguins?

11.3 Hypothesis testing

11.3.1 The US court system

  • Defendants are presumed not guilty unless proven guilty.
    • We prefer to err on the side of letting guilty people go, rather than convicting innocent people.
  • In statistics, we either find evidence for \(H_A\), or we fail to find evidence.
    • In the latter case, it doesn’t mean we’ve proved that \(H_0\) is true, only that we have failed to prove that it is false.

11.3.2 P-value and statistical significance

  • General Principle: The smaller the P-value, the more evidence we have for an association.
  • Misguided simplification: A P-value smaller than 0.05 is “significant”.

11.2 Opportunity cost case study

Research Question: Are students influenced by advice about being prudent with their money?

Treatment and Control

  • Control group instructions: pick one:
    • Buy this entertaining video.
    • Not buy this entertaining video.
  • Treatment group instructions: pick one:
    • Buy this entertaining video.
    • Not buy this entertaining video. Keep the $14.99 for other purchases.

Hypotheses

  • \(H_0:\) Null hypothesis. Reminding students that they can save money for later purchases will not have any impact on students’ spending decisions.

  • \(H_A:\) Alternative hypothesis. Reminding students that they can save money for later purchases will reduce the chance they will continue with a purchase.

11.2.1 Observed data

glimpse(opportunity_cost)
Rows: 150
Columns: 2
$ group    <fct> control, control, control, control, control, control, control, control, control, …
$ decision <fct> buy video, buy video, buy video, buy video, buy video, buy video, buy video, buy …
table(opportunity_cost)
           decision
group       buy video not buy video
  control          56            19
  treatment        41            34

Observed Statistic

prop.table(table(opportunity_cost), margin = 1)
           decision
group       buy video not buy video
  control   0.7466667     0.2533333
  treatment 0.5466667     0.4533333

Observed statistic: \(\hat{p}_C - \hat{p}_T \approx 0.746 - 0.546 = 0.2\)

11.2.2 Variability of the statistic

summary(opportunity_cost)
       group             decision 
 control  :75   buy video    :97  
 treatment:75   not buy video:53  
  • 97 buy video cards, 53 not buy video cards
  • Shuffle and deal into two piles of size 75, treatment and control
  • Compute simulated \(\hat{p}_C - \hat{p}_T\)
  • Repeat many times and keep track of the simulated statistics.

11.2.3 Observed statistic vs. null statistics

  • The observed statistic was 0.2.
  • In the simulated statistics, only 6 of 1000 null statistic were this extreme.
  • P-value: \(6/1000 = 0.006\)